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Abstract 


We study two-player zero-sum recursive games with a countable state space and finite 
action spaces at each state. When the family of n-stage values {v n ,n > 1} is totally bounded 
for the uniform norm, we prove the existence of the uniform value. Together with a result 
in Rosenberg and Vieille |12j . we obtain a uniform Tauberian theorem for recursive game: 

( v n ) converges uniformly if and only if (u^) converges uniformly. 

We apply our main result to finite recursive games with signals (where players observe 
only signals on the state and on past actions). When the maximizer is more informed 
than the minimizer, we prove the Mertens conjecture Maxmin = lim^oo v n = lim^o^A- 
Finally, we deduce the existence of the uniform value in finite recursive game with symmetric 
information. 

Keywords: Stochastic games, recursive games, asymptotic value, uniform value, Tauberian 
theorem, maxmin 


1 Introduction 

Stochastic games were introduced by Shapley jl~3j to model a multiplayer dynamic interaction, 
where players’ collective decisions influence the current payoff and also the future state. In 
this article, we focus on two-player zero-sum recursive games introduced by Everett [2]. The 
specificity of a recursive game is that the state space is divided into two sets: absorbing states 
and active states. On absorbing states, the process is absorbed and the payoff is fixed. On active 
(non-absorbing) states, the payoff is always equal to 0. 

There are several ways to evaluate the payoff stream in a zero-sum stochastic game. Given 
a positive integer n, the n-stage payoff is the expected average payoff during the first n stages. 
Given A e (0,1], the A-discounted payoff is the Abel mean of the infinite stage payoffs with a 
weight A(1 - A)* -1 for stage t. We will focus on the concept of uniform value. A stochastic game 
admits a uniform value if both players can approximately guarantee the same payoff level in all 
sufficiently long n-stage games without knowing a priori the length of the game. 

Mertens and Neyman [7J proved that a stochastic game with a finite state space and finite 
set of actions where the players observe the current state and the stage payoffs admits a uniform 
value. Their proof uses the fact that the function A h* v\ has bounded variation, where v\ is the 
A-discounted value (Bewley and Kohlberg [T|). For stochastic games with an infinite state space, 
this argument in general does not apply. 

Markovian decision processes (henceforth MDP) are stochastic games with only one player. 
Lehrer and Sorin [5] showed that in a MDP, the uniform convergence of ( v \) (w.r.t. the initial 
state) as A tends to zero is equivalent to the uniform convergence of the n-stage values ( v n ) as 
n tends to infinity. Nevertheless, uniform convergence of ( v n ) or ( v \) is not sufficient for the 
existence of the uniform value (cf. Monderer and Sorin [9] or Lehrer and Monderer [3]). 

For recursive games, the situation seems to be different. There are two results giving sufficient 
conditions for a recursive game with countable state space to have a uniform value. The first 
one can be derived from Rosenberg and Vieille m if (v\) converges uniformly to some function 
v, then the recursive game has a uniform value, which is equal to v. The second one is due to 
Solan and Vieille }14|| : if, except on a finite subset, the limsup is above a strictly positive 

x The limsup value is the value of the game in which the global payoff to player 1 is the limsup of the stage 
payoff stream. 
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constant on the non-absorbing states, then the recursive game has a uniform value, which is 
equal to the linrsup value. 

The main result of this paper is that the uniform convergence of the n-stage values is a sufficient 
condition for the existence of the uniform value. In fact we prove a stronger result: for any 
recursive game with countable state space, if the family {v n ,n > 1} is totally bounded for the 
uniform norm, then the uniform value exists. Our proof follows the same idea as Solan and 
Vieille |14| and we will use several of their results. 

Our result together with the result of Rosenberg and Vieille m provides a uniform Tauberian 
theorem for recursive games: ( v n ) converges uniformly if and only if ( v \) converges uniformly, 
and in case of convergence, both limits are the same. For general stochastic games, Ziliotto [20] 
provided recently a direct proof of this result. 

Finally, we apply our main result to finite recursive games with signals. In a recursive game 
with signals, players do not perfectly observe the state and actions at every stage anymore, 
rather they receive a private signal. Mertens [6] conjectured that in a general model of zero- 
sum repeated games, if player 1 (the maximizer) is always more informed than player 2 (the 
minimizer) during the play (in the sense that player 2’s private signal can be deduced from 
player l’s private signal) then Maxmin = linin^oot^ = lim^o v \> he., both the uniform maxmin 
and the asymptotic value exist and are equal. 

Ziliotto [19] showed that the result is false in general. Nevertheless, several positive results 
have been obtained for subclasses of games including Sorin m and Sorin [16] for Big match with 
one-sided incomplete information, Rosenberg et al. Ml, Renault [TD] and Gensbittel et al. [3] 
for a more informed controller, and Rosenberg and Vieille [T2] for recursive games with one-sided 
incomplete information. 

We prove the Mertens conjecture in finite recursive games with signals, where player 1 
is always more informed than player 2 during the play. The proof uses several results from 
Gensbittel et al. |3], concerning the n-stage value functions in a repeated game where player 1 
is more informed than player 2. Our result generalizes Rosenberg and Vieille [T2] . which deals 
with the model where player 1 is informed of a private signal on the state at the beginning of 
the game. Moreover, we deduce the existence of the uniform value in finite recursive games with 
symmetric information. 

The organization of the article is as follows: in Section [2] we introduce the model of recursive 
games; in Section [3] we present the main result and several corollaries; Section [4] is dedicated to 
the proofs; finally in Section [o] we apply the result to finite recursive games with signals. 


2 Preliminaries: model and notations 

Notation Given any metric space S, endowed with the Borelian a-algebra, we denote by A (S') 
the set of probabilities on S and we denote by A /(S’) the set of probabilities with finite support. 

2.1 The model 

A two-player zero sum stochastic game T = ( X, A, B, g,q } is given by 
• a state space X. 
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• player l’s action set A, and for any x € X, A(x ) is a finite subset of A. 

• player 2’s action set B, and for any x e X, B(x ) is a finite subset of B. 

• a payoff function: g ■ X x Ax B -*■ [-1, +1], 

• a transition probability function: q: X x Ax B -> Aj(X). 

Play of the game The stochastic game with initial state x\ e X is denoted by T(xi), and is 

played as follows: at each stage t> 1 , after observing (xi,a\,b\, . player 1 and 

player 2 choose simultaneously actions at € A(xt) and bt € B(xt). The stage payoff is g(xt,at,bt ) 
and a new state xt+i is drawn according to the probability distribution q(xt,at,bt). Both players 
observe the action pair ( at,bt ) and the state xt+ 1 - The game then proceeds to stage t + 1. 

Note that we did not make any measurability assumption on the model. As the transition prob¬ 
ability distribution is supposed to be finitely supported, given an initial state, the set of actions 
and states that might appear in the infinite game are in fact countable. Therefore probability 
distributions are well defined. 

Recursive game T is a recursive game if there exist a set of active states denoted by X° and 
a set of absorbing states denoted by X* with X° u X* = X and X° n X* = 0, such that: 

• the stage payoff is 0 on active states: Vse X °, g(x,a,b ) = 0, V(a,6) e A(x) x B(x ); 

• states in X* are absorbing: Vx e X *, q(x, a, b)(x) = 1, V(a, b) e A(x) x B(x), and g(x. a, b ) 
depends only on x. 

2.2 Definition of strategies and evaluations 

History At stage t. the space of finite histories is H t = (X x Ax B)^ 1 x X. Set H^ = (XxAxB)°° 
to be the space of infinite plays. We consider the discrete topology on X, A and B. For every 
t > 1, we identify any ht € Ht with a cylinder set in and denote by Ht the a- field of Ht 
induced on HA. The product cr-field on H^ is Hoo = <r(Ht,t > 1). 

Strategy A (behavior) strategy for player 1 is a sequence of functions a = (crt)t> l with each 
t > 1, at : ( H t ,Ht ) ->■ A(A) such that for every ht e H t , at(ht)(A(xt)) = 1. If for every t > 1 
and ht € Ht, there exists a e A(xt) such that <7t(/q)[a,] = 1, then the strategy is pure. We define 
similarly a behavior strategy r for player 2. Denote by £ and T respectively player l’s and 
player 2’s sets of behavior strategies. Denote by £ and T respectively player l’s and player 2’s 
subsets of strategies that depend on the histories only through the states but not on the actions. 

Evaluations Let us describe several ways to evaluate the payoff in T. By Kolmogorov’s ex¬ 
tension theorem, any triple (aq,<r,r) e X x £ x T induces a unique probability distribution over 
(Roo,Hoo) denoted by P X i,o-,r. L e t ^xi,<t,t t> e th e corresponding expectation. 

n-stage average For each positive n > 1, the expected average payoff up to stage n, induced by 
the couple of strategies (<r, r) and the initial state x\ is given by 

7n(ar ,<T,r) = E X1 ar ( - Y i g(xt,a t ,bt) ) • 
v n t=i / 
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The game with expected n-stage average payoff and initial state x\ is denoted as T n (xi). 

X-discounted average For each A e (0,1], the expected A-discounted average payoff, induced by 
the couple of strategies (a, r) and the initial state x\ is given by 

( oo 

A E( 1 - A ) ( * ~ l) g( x tia t ,b t ) 

t =1 

The game with expected A-discounted average payoff and initial state x\ is denoted as T / > i (xi). 

In either T n (xi) or T^(xi), player 1 maximizes the expected average payoff and player 2 min¬ 
imizes it. For a fixed x\ the game r n (xi) is finite, so there exists a value v n (x\ ) by minrnax 
theorem. The existence of the discounted value V\(x\) is also standard, and we refer to Mertens 
et al. [8] (Section VII. 1.) for a general presentation. 

2.3 Stopping time and concatenation of strategies 

A function 9 : (-Hoodoo) -+ N is called a stopping time if the set {h e Hoo\9(h) = t} is TLt~ 
measurable for all t > 1. Explicitly for any h, h' € H and n > 1: if h and h' coincide until stage 
n and 9(h) = n then 9(h') = n. Let 9 and O' be two stopping times, we write 6 < O' if for every 
ZieiLoc, 0(h) < 0'(h). 

Given a sequence of strategies (<7^)^>i and a sequence of increasing stopping time (6^)^>i, 
we define a* := o^Q\<j^d 2 • ■■ as the concatenation of (cr^ )f>\ along (0 <j)^> i- Given n > t > 1 
and h e Hoo, let h n be the projection of h on H n and hf n be the history of h between stage t 
and n. The strategy a* is defined by = a\, 1 '( h n ) if n < 9\(h)-, a^(h n ) = a L ( h ^ m_1 ) if 

9 m -i < n < 0 m . Informally, for every t > 1 at stage Op, the player forgets the past and starts to 
play Gf + i at the current state. 

2.4 Uniform value 

Uniformly guarantee Player 1 uniformly guarantees w if for every e > 0, there exists a £ in E 
and No > 1 such that for every x\ € A 0 , 

7 n (xi,cr £ , r) > rc(xi)-e, Vn > N 0 , Vr e T. 

We say that the strategy a £ uniformly guarantees w-e. Similarly, player 2 uniformly guarantees 
w if for every e > 0, there exists t £ in T and No > 1 such that for every x\ e A 0 , 

7 n (.'ci,cj,r e ) < w(xi) + e, Vn>A 0 , Vex e E. 

Uniform value Voo '■ A ->■ M is the uniform value of the game F if both players uniformly 
guarantee Voo ■ A strategy for player 1 ( resp. player 2) that uniformly guarantees Voo ( resp. 
Voo + e) is called uniform e-optimal. If both players can uniformly guarantee Voo with pure 
strategies, T has a uniform value in pure strategies. 

Remark 2.1 In defining the uniform value, we ask No to be independent of the initial state x\. 
One direct consequence of the existence of the uniform value Voo is the uniform convergence of 
(v n ) n >i to Voo■ This is stronger than the definition where the existence of the uniform value is 
considered state by state (see for example Solan and Vieille ra Definitions 3-4) 
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3 Main results 


In this section, we present the main result of the paper, namely Theorem 13.11 as well as several 
corollaries. We also provide an example that does not satisfy the condition of Theorem 13.11 and 
does not have a uniform value. 

3.1 Sufficient condition for the existence of the uniform value 

Denote by B(AQ the set of functions from X to [-1,1] with the uniform norm || • j|oo. Recall 
that a set of functions F in (B(AQ, ||.||oo) is totally bounded if for every e > 0, there exists a finite 
subset Fji = {f r : 1 <r< R} c F such that for any / e F, there is f r e Fr with ||/ - / r ||oo < £. 

Theorem 3.1 Suppose that the space {v n ,n > 1} is totally bounded for the uniform norm, then 
the recursive game T has a uniform value Voo■ Moreover both players can uniformly guarantee 
Voo with strategies that depend only on the history of states and not on past actions. 

We deduce from the previous result a uniform Tauberian theorem in recursive games. 

Corollary 3.2 The sequence of n-stage values (v n ) n >i converges uniformly as n tends to infinity 
if and only if the sequence of X-discounted values (ua)a£(o,i] converges uniformly as A tends to 
zero. In case of convergence, both limits are the same. 

On one hand, if ( v n ) converges uniformly, the family is totally bounded, thus the uniform value 
exists, and this implies the uniform convergence of ( v \) (Sorin |17| . Lemma 3.1). On the other 
hand, the converse result is established in Rosenberg and Vieille m (see Remark 6, Theorem 1 
and Theorem 3). 

Remark 3.3 The equivalence of the uniform convergences of (u n ) n >i and (wa)a6(o,i] has been 
proven in MDP by Lehrer and Sorin m- Ziliotto TWj recently showed that it is also true for 
stochastic games whenever the Shapley operator is well defined. 

If, in addition, for every n > 1 the n-stage value v n (x ) exists in pure strategies, then T has a 
uniform value in pure strategies. 

Corollary 3.4 Suppose that for every n > 1, both players have pure optimal strategies in the 
n-stage game, and {v n ,n > 1} is totally bounded for the uniform, norm.. Then T has a uniform 
value Voo in pure strategies. Moreover, both players can uniformly guarantee Voo with strategies 
that depend only on the history of states and not on past actions. 

Remark 3.5 The result in Corollary \3.f \ extends to games with general action sets A(x) and 
B(x) provided that for any n > 1, the n-stage game has a value and both players have pure 
optim.al strategies. 

The proof of Corollary 13.41 is similar to that of Theorem 13.11 The key difference involves a 
technical lemma (Lemma 14.181) for the existence of a (pure) stopping time which is used in the 
definition of players’ optimal strategies (see the proof of Proposition 14.31) . We discuss this point 
and present the proof in Subsection 14.31 
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3.2 A recursive game without uniform value 

We present here an example of a recursive game with countable state space where {v n , n > 1} is 
not totally bounded and there is no uniform value (See Figure 3.2 below for illustration). This 
is an adaptation to our framework of an example in Lehrer and Sorin [5]. 


The state space is a subset of Z x Z. The set of active states is X° = {(x, y) e Nx N |0 < y < x} 
and the set of absorbing states is X* = X* (JX* 2 (two types), where X* = N x {-1} and X* 2 = 
{(x,x + l)|x > 0}. The payoff is 1 on X* and is -2 on X* 2 . There is only one player (maximizer), 
whose action set is {R(ight),J(ump)}. The transition rule is given by: 

• at (x, 0) e A 0 : g((x, 0), R)(x + 1,0) = 1, and g((x, 0), j) (x, -1) = g((x,0), j)(x, 1) = 

• at (x, y) e X° with 0 < y < x: g((x, y), a)(x, y + 1) = 1, Va e {R, J}. 


Starting at (0,0), one optimal strategy for an n-stage game is to go Right for half of the game, 
and then to Jump. This gives an expected average payoff around |, thus lim^oo u n (0,0) = j. 

ln( —) 

In a A-discounted game, the optimal stage to Jump is approximately It follows that 

Ua( 0,0) ~ and thus lim^o ^(0,0) = This implies that there is no uniform value. On the 
other hand, {v n ,n > 1} is not totally bounded for the uniform norm. Indeed, the convergence of 
( v n ) is not uniform: for any x > 1, lim^oo v n (x, 1) = -2 while v x (x, 1) = 0. 



The figure on the left illustrates 
a play ( R , ...,R, J) jumping after 
n steps: with probability 1/2 the 
state is absorbed at (n,n + l) e X* 2 , 
with probability 1/2 the state is 
absorbed at (n,-1) e X/. 

—»■ : a deterministic transition; 

—»: a probabilistic transition. 


Figure 3.2 


4 Proofs 

In the first subsection, we introduce and establish preliminary results for a subclass of recursive 
game, which will be called positive-valued recursive games. In the second subsection, we prove 
Theorem 13.II by a reduction of any recursive game to a positive-valued recursive game. The proof 
for Corollary 13.41 is given in the third subsection. 
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4.1 The case of positive-valued recursive game 

Definition 4.1 A recursive game is positive-valued if there exist M > 0 and uq > 1 such that 
for every non-absorbing state x e X°, there exists n(x) < no such that v n t x ^{x) > M. 

In order to state the next proposition, we first introduce the notion of uniformly terminating 
strategy. 

Definition 4.2 Denote by p the stopping time of absorption in X*: p = inf{n > l,x n € X*}. 
The strategy a is said to be uniformly terminating if for any e > 0, there exists N > 1 such that 
for every x\ e A 0 and for every r e T, ^ Xl ,a,r{p ^ A) > 1 - s. 


Proposition 4.3 Let T be a positive-valued recursive game. We fix the numbers M > 0,?to > 1 
and the mapping n(-) : A 0 —> {1, ...,no} such that v n ^(x) > M, Vx e A' 0 . 

Then player 1 uniformly guarantees x n (.)(-) with uniformly terminating strategies that depends 
only on states: for all e > 0, there exists a* in £ and Nq > 1 such that for every xi e A 0 and 
every t in T, 


(0 ^ X\,cr* ,T (p<N 0 )>l-£ and (ii) 7 n (xi, a*, r) > v n ( xi )(x{) - e, Vn>A 0 . 

Proof. Let it be a profile of strategies such that for every x e A 0 , cf(x) is optimal in the 
n(x)-stage game r^^x). Let k ■= k(x) be a random stage uniformly chosen in {1,..., n(x)}. 
For any rtf and x e A 0 , (x, a, r) and k induce a probability distribution over x {1, ...n(x)}, 
which we denote by P x £ r . Let E X $ T be the corresponding expectation. We obtain: 


_ r 1 n ( x ) 

Ea; a t _ T T ^ 9i. x t) 

Ln(x) l=1 



^ v n(x)( x ) > M - 


It follows that 


^x,d,T \s( x k)^-p<k + 9( x k)^-p>k 


> M. 


On the event {p > k}, g{xjf) = 0, whereas on the event {p < k}, we have g(xjf) = g(x p ). This 
implies that 


Ex,(T,r(p ^ ^)E Xj a,T I P - — ^x,d t,t \_9 (®fc)] — x n(x)( x ') — (4-1) 

Using the fact that the payoff function g has maximal norm 1, we deduce from (14.11) : 

E x ,&,t(p <k) >M. (4.2) 

Define the strategy^ cr* as concatenations of (d(x Ui ));>o at the random stages (u^> q, where Uf is 
defined inductively along the play by uq = 1 and U£+ \—U£ = k(x Ue ) follows the uniform distribution 
over {1,..., n(xi )}. Let P x ,a*,r be the (product) probability distribution over x {l,...,no} N 
induced by ( x,a*,T ), and E X)Cr * ir the corresponding expectation. Let e > 0. 

{%) We show that a* is uniformly terminating. By (14.21) . the conditional probability of absorbing 
on each block {ui- 1 , - 1} is no smaller than M. Thus for any r and xi e A 0 , 

Exi (p > Ui) < (1 - M) 1 , VI > 1. 

2 The strategy a* is a generalized mixed strategy, which is equivalent to a behavior strategy by Kuhn’s theorem. 








The length of each block is uniformly bounded by no, thus if we put T > : 

(P < n 0 l*) > F x1j{7 *,t (p< ui*) > 1 - (1 -M) 1 >1 -£. (4.3) 

( ii ) We now argue that <x* uniformly guarantees W(a;i)( x i) “ 3e. Let Ao = UqT/ s. For any 
ref, 11 € A 0 and n > uqT , we have 


®xi,cr*,r [5 (^n)] 


E. 


Xi,cr* 


~l*-\ 
. Z=0 


r-i_ _ 

= P21 — P < ’^Z+l)®'a:i,< 7 *,T [fi r (^'p)l^'Z — P ^Z+l] 

Z=0 

+ Pxi,< 7 *,r(^Z* < p)E a;iitr * ) r [5 f (3'n)|p ^ Z/;*] • 

According to (14.31) . P X1 T (p > ^z*) < £, thus we focus on an absorption before n^*: 


Z*-i_ _ 

^ Pxi,cr’ , ',r(^Z — P < ^Z+l)Pa:i,cr*,T [*?(‘Z'p)|^'Z — P ^Z+l] — £ 
Z=0 


(4.4) 


For each l > 0, a* is following <j(x U i ) for ui + \ - m = k(x Uf ) stages. Thus (14.11) applies, and we 
obtain: for i>\, 

^xi ,cr* ,r (li-/ < p < )E£ 1?a * r [^(Xp)|?i/ < p < 'Uj+l] > (p ^ Ui)M > 0, 

and for l = 0, 

■^OJi ,cr* ,r (1 - P < ' W l)Pxi,C7*,T [5(®p)|l - P < ^l] — W(xi)(‘ I; l)- 
By substituting the two previous inequalities into (14.41) . we obtain that 

Vn > n 0 r,Vx i e A 0 , E x1j0 .* jT [fif(x n )] > v n ( xi )(a:i) - £. (4.5) 

Now for n > No, we deduce that 7 n (xi, a*, r) > u n ( X i)(xi) - 3e. ■ 


One can deduce from Proposition 14.31 a first result on recursive games with the condition that 
the sequence of n-stage values converges uniformly to a function bounded away from 0. 


Corollary 4.4 Assume that in a recursive game T, the sequence of n-stage values (v n ) n >i con¬ 
verges uniformly to a function v satisfying for every x e A 0 , v(x) > M' > 0 for some M'. Then 
T is positive-valued and player 1 uniformly guarantees v with uniformly terminating strategies. 


4.2 Existence of the uniform value (proof of Theorem 13.1|) 

This subsection is devoted to the proof of Theorem 13.11 the total boundedness of {v n ,n > 1} 
implies the existence of the uniform value Voo ■ We prove that player 1 guarantees the point-wise 
limit superior value x *-+ v(x) := limsup n v n (x). By symmetry, player 2 guarantees liminf n v n (x), 
and the result follows. 


The uniform e-optimal strategy will use alternatively two different types of strategies. This 
approach is classical for recursive games and has been used for example in Rosenberg and Vieille 
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m and in Solan and Vieille m ■ Our construction is close to Solan and Vieille m in which 
some similar "positive-valued recursive game" is introduced to make a reduction for the general 
case. 

The proof is decomposed into three parts. In the first one, we introduce a family of auxiliary 
positive-valued recursive games and define the first type of strategies. In the second part, we 
define the second type of strategies. Finally, we construct the strategy a* and prove that it is 
uniform e-optimal. 

Before proceeding to the proof, let us first prove a preliminary result, which shows that due 
to the total boundedness of {v n }, the point-wise limit superior of ( v n ) can be realized along 
uniform convergent subsequences. We fix a recursive game T for the rest of this section. 

Proposition 4.5 For every x e X , we have 

v(x ) = limsupu n (x) = ma x/(x), 

n f*F 

where F is the set of limit points of the sequence (v n ) n >\ in (B(X), ||.||oo)- 

Proof. (B(X), |j .||oo ) is a complete metric space and ({u n }, || • ||oo) is totally bounded, therefore 
F is compact and non-empty. For every x e X, we denote w(x ) := ma f(x). Fix x e X. 
Since v(x) is the largest limit point of (v n (x)) n >i, we have w(x) < v(x). By definition of the 
limit superior, there exists a subsequence ( v nk (x))k>i which converges to limsupw n (x). There 
exists a subsequence of ( Vn k )k>i that converges in (B(X), ||.||oo) to some f* e F, therefore 

max/(i) > f*(x) = v(x). 


4.2.1 Reduction: auxiliary recursive games 


Auxiliary recursive games Let 9 : X -> {0,1}. We define the auxiliary recursive game 
r e = (A,B,X = Xg U Xg , qg , gg) where any active state x e X° such that 6(x) = 1 is seen as an 
absorbing state: the active state space of is Xg = {x e A°,#(x) = 0} and the absorbing state 
space is Xn = X* U{^ € X°,9(x) = 1}. The transition qg is equal to q and the payoff gg is equal 
to g on all states except {x e X°,9(x) = 1}, on which the state is absorbing and the absorbing 
payoff is gg =v. For every n > 1, let v® n be the value of the n-stage auxiliary game T®. 

Proposition 4.6 Let g > 0 and 9 : X -» {0,1}. There exists uq > 1 such that for every aq e Xn, 
there exists n(x i) < no with v ^ xi ^( x i) ^ ^(^l) ~ 4 g. 

Proof. Let g > 0 be fixed and Fr = {/i, .... fn} ^ F be a finite cover of size | of the set F. As 
{v n ,n > 1} is totally bounded, there exists some stage n(g) e N, after which any n-stage value 
v n is §-close to F its set of accumulation points, hence //-close to Fr: 

3n(g) e N, Vn > n(g), 3f r € {/i,...,/«}, s.t. \\v n - / r ||oo < ??; (4.6) 
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Moreover for every r e {1, ..,i?}, f T is an accumulation point of {v n ,n >1}, therefore there exists 
some n r > such that v nr is 77 -close to f r : 


V/ r e Fr, 


3n r > 


n(r]) 

■ 

V 


s.t. 


/r||oo — 7/ ■ 


(4.7) 


Finally we take no = max{n r : 1 < r < R}. The integers n r are chosen such that when absorption 
in Xg occurs in the game of length n r the remaining number of stages is either a fraction smaller 
than ?? of the total length of the game or greater than 71 ( 77 ) and Equations (14.61) applies. 

Let x'i e Xg be any non-absorbing state in the auxiliary game T 0 . By compactness of F, 
there exists / e F such that f(x 1 ) = v(xi) and f r e Fr with ||/ - f r \\oo < In particular at state 
xi, 

/r(®l)^/(®l)-|=u(*l)-|, 

which together with (14.71) implies that 


Vn r (x l) > fr(x 1 ) - 77 > v(xi) ~ -T). 


(4.8) 


We now prove that 

Vn r (x l) > V nr (xi)-2rj. 


(4.9) 


Denote by 


pe = mi{x t € Xg} = inf{x t e X* or 6(x t ) = 1} 


the stopping time associated to absorption in T 0 , and set p r g r = min (pg,n r ). An adaptation of 
standard proof technique of the Shapley equation gives us: 


v n r ( x 1) = max min E. 


ereE re7~ 


xi,cr,T 



p7- 1 


E d( x t) 

t =1 j 


n r 

n r - pg r 


+ 1 


71 r 


~Pe r+1 


(*#•) 


We separate the histories into two sets depending on whether n r - p^ r (h) + 1 > 71 ( 77 ) i n which 
cases Equation (14.6p applies, or n r -p^ r (h) + l < 71 ( 77 ) in which cases Pe n ^ +1 < 77 (by definition 
n r > —jp-), and deduce that 


Vn r ( x l) 


< max min E X1 a T 
ctgS reT 


1 

71 r 


pr-i 

E 9( x t) 


t=i 


+ 



+ 27?, 


with f' h e Er depending on the history given by Equation (14.61) applied to v n ^ p nr + [ when n r - 
p^ r + 1 > 71 ( 77 ), and any function in F r otherwise. Therefore, by considering v as the supremum 
of / e F at each point x p ™ r e X, we have f' h (x p n r ) < v(x p n r ), thus 


. (an) < max min E X1 a T 
cn=E reT ’ 



9( x t) 


n r 


Pe r 


+ 1 


71 r 


V ( X p% r ) I +2l l 


= Vn r ( x l) + 2 ??. 
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This proves inequality (14.9|) . We now use Equation (14.81) and Equation (14.9[i to conclude: 

v e nr M - v (xi) - 4 ? 7 . 

It means that for each x\ e X there exists n(x 1 ) := n r < no = max{n r : 1 < r < R}, such that 

Remark 4.7 Proposition \ f.b\ is also true if 9 is a deterministic stopping time and not only a 
function on the state. The auxiliary game would be defined on a larger state space: the set of 
finite histories of the original game. The proof in itself is similar. 

Fix now any e > 0 and define 9 e : X -> {0,1} such that {x e X , 0 e (x) = 1} = {x £ X, v(x ) < e}. 
We denote by T e = (A,B,X = X® \JXf ,q e ,g e ) the auxiliary game associated to T defined by the 
stopping time 9 e . 

Corollary 4.8 In the game T £ ; Player 1 uniformly guarantees v with uniformly terminating 
strategies that depend only on past states. 

Proof. Let rj € (0,e/8], by Proposition 14.61 there exists no > 1 such that for every .ti e X®, there 
exists n(x 1 ) < no with 


V n( X 1 )( X l) * v ( x i) ~ 4 '? ^ e / 2 > ( 4 - 10 ) 

where the second inequality comes from the definition of X®. Therefore, T e is a positive-valued 
recursive game and by Proposition 14.31 player 1 uniformly guarantees v^^(-) with uniformly ter¬ 
minating strategies in £. By Equation (I4.10p . it follows that for every 77 > 0, player 1 uniformly 
guarantees v - 4r) with uniformly terminating strategies. ■ 

Fix now a strategy a* that is uniformly terminating in depends only on past states and 
guarantees v(x±) - e 2 in r £ (xi) for every x\ e X®. 

4.2.2 One-shot game 

One-shot game G 4 For each / : X -> [-1, +1] and x\ e X , we define the one-shot game G 4 as 
follows: player l’s action set is A(xi), player 2’s action set is B(x 1 ), and the payoff is for each 
(s,t) e A(^4) x A (B), 

E 9(*i,a,t)[/(®2)] = Y, E d( x li a i b )( x 2)f( x 2) 

azA,b£B \X2tX 


Lemma 4.9 For any limit point f e F, the one-shot game G 4 has a value equal to f. 
Proof. Let n > 1, it is known that (cf. Vigeral [L8] p.40, Lemma 4.2.2) 





and by Shapley’s formula that 
v n+1 (xi) = sup 


inf E„f x s 


1 y~i 

a(x 1 ) + —- v n (x 2 ) 


inf 
teA(B(x 1 )) 


SUp 


n + 1 
1 

n + 1 


n + 1 


n 


g{xi) + - -v n {x 2 ) 

n + 1 


seA^U(a;i)^ 

We obtain the result by taking the limit along a subsequence converging uniformly to / e F. 


Following Proposition 14.51 one can take for each je! some f*tF such that v(x) = f*(x ) > 
/(i),V/eF. Then the following result is a direct consequence of Lemma [4.91 . 

Corollary 4.10 For every x\ e X, there exists s*(x i) € A(A(xi)) siLch that 

Vb € B(x i), E g (x ljS *(a: 1 ),6) K^)] > v(x{). 

Fix now s* := (s*(xi)) x a profile of strategies satisfying the conclusion of Corollary 14.101 
4.2.3 Optimal strategy 

Roughly speaking, we build a a uniform e-optimal strategy for player 1 to play a* in T e on the 
states with value v above 2 e, and to play s* in G v on the states with value v below e. And for 
the states with value v between e and 2 e, a will be either of the two depending on the regime. 

Construction of a Define a sequence of stopping times i and the concatenated strategy 
<7 := s*uia*u 2 s*Usa*U 4 : ■ in V as follows: 

• a is to play s*(x n ) at each stage n up to stage (not included) 

ui = inf{n > 1 , v(x n ) > 2 e}; 
and then to play cr*(x Ul ) up to stage (not included) 

u 2 = inf{n > ui, v(x n ) < e}. 

• In general: for each r > 1, a is to play <x*(x U2t I ) from stage u 2r -\ (the odd phase ) up to 
stage (not included) 

u 2r = inf{n > u 2r -i, v(x n ) < e}. 

and then to play s*(x n ) at each stage n > u 2r (the even phase), up to stage (not included) 

u 2r+ 1 = inf{n > u 2r , v(x n ) > 2e}. 

Remark 4.11 The idea of alternating between two types of strategies is common in Rosenberg 
and Vieille m Solan and Vieille JZ2F and this article. The main difference is the definition 
of the target function v used to define how to switch from one type of strategies to the other. 
Rosenberg and Vieille m use the limit of discounted values and cr* is an optimal strategy in 
some \-discounted game (for A close to zero). Solan and Vieille use the limsup value and 
introduce an auxiliary positive-valued game. We adopt a similar approach to Solan and Vieille 
BF but with v the largest limit point of ( v n ). 
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By construction, a depends on the histories only through the states and not the actions. Let 
us show that a uniformly guarantees v-25e for player 1, which finishes the proof of Theorem 13. II 


Fix from now on any x\ e X. Recall that p denotes the absorption time in the game T. The 
next result shows that the process (v(x m [ n ^ P:U p))i>i, which is the value of v at switching times 
(ui), is almost a submartingale up to an error of e 2 . 


Proposition 4.12 For every l > 1 and every r € T : 

^xi,a,T [^(®min(p,'iii + i))| , ^min(p,M;)] — ^(®min(p,it;)) ~ ^ 

on the event min (p,ui) < +oo. 


Proof. Take any t in T- The result is true if p < up Suppose that l is even and p > up. by 
construction the strategy ( s*(x n )) is used during the phrase n e {up ...,«;+ 1 - 1}, thus: 


^‘xi,a,T[v(x n+ i)\'H n \ > v(x n ), for all Ui<n< min(p,u i+ i). 

Therefore ( v(x n )) is a bounded submartingale and by Doob’s stopping theorem, 

[^(■^min(p,uj + i) ) l"^min(p,u;)] — ^’(^-'min (p,it;)) - 

Suppose that l is odd and p > up By construction, player 1 is using af(x Ul ), which uniformly 
guarantees v(x Ut ) - e 2 in the auxiliary game T £ {x Ul ): 


3A^ 0 ^ 1) 


E 


'X\ ,CT,T 


1 

n 


ui+n 

E 9e(x t )\H Ul 


t=Ul +1 


> v(x Ul ) - e 2 for all n > Nq. 


(4.11) 


Denote by p £ = min{m > ui + 1 : x m e Xf} the absorption time in r e (.T U; ). Since in recursive 
games the payoff is zero before absorption, we have 


vi,cr,T [*7e(-£/3 e )l^iii] - ^xi,<j,- 

By the dominated convergence theorem, 


l ui+n 

lim - E 9e{xt)\'H Ul 


n —>-00 fi 


t=Ul +1 


E. 


X\ ,CT,T 


^ u l+n 

lim - E 9e{xt)\H Ul 

— lim E Xl ,a, T 

ui+n 

~ E 9e{xt)\U Ul 

n ^°° n t=Ul+1 

n—>oo 

n t=Ul+ i 


(4.12) 


(4.13) 


We deduce from (14. 1 1D - (I4. 13[) that 

^xi,a,T \_9s(.Xp e >v{x u f)—£ . 

Moreover, g £ {x p c ) = v(x p e) and conditionally on p > up p £ = min(u/ + i, p). It follows that 

\V^ x rmn(p ) ui +1 ))\^-ui\ ^ x {x U[ ) ~£ . 


14 














Due to the possible error term e 2 , the sequence (v(x m [ n ^ p ^))i>i is not a submartingale. Never¬ 
theless, one can prove a lemma similar to the usual upcrossing lemma for submartingale. Indeed, 
the value is a martingale excepts if it crosses upwards the interval [e, 2e]. When this happens, the 
value may decreases of at most e 2 . With the submartingale property established in Proposition 
14.121 an easy adaptation of the standard result on upcrossing number of submartingale implies 
the following result, as was shown in Proposition 3 of Rosenberg and Vieille [12| : 

Lemma 4.13 Let N = sup{p > 1 : U 2 P - 1 < +oo} be the number of times the process ( v(x Ul )) 
crosses upward the interval [e,2e].For every r e T, 

£ - £ z 

By construction, cr* is uniformly terminating within the auxiliary absorbing states X*. That 
is to say, any play between stages U 2 P ~\ and U 2 P (on an odd phase) has bounded length with 
high probability under the strategy crf(x u2 i), uniformly over any starting state x U2p _ 1 e X®. 
Since Lemma 14.131 implies that the number of odd phases is bounded in expectations, the total 
frequency of stages on all odd phases is negligible for n large. Let us formalize this fact. 

Recall that p e denotes the absorption time in the auxiliary game T £ . It follows that there 
exists N\ > 0 such that 


Vx € and r € T : P x ,a*( x ), r (p £ >N 1 )<£ 3 . (4.14) 

For each n e N, define A n = {u 2 P ~i <n< min(p, U 2 P ), ti 2 p-i < p , for some p} c H These are all 
inhnite plays where stage n is in an odd phrase, i.e., the stages between U 2 P -1 and U 2 P on which 
a* (x U2 . p l ) is used. We fix for the rest of subsection the uniform stage number N\ satisfying 

HTTP 

Lemma 4.14 For every r e T and every n > 

1 n 

- E p xi,F(*i),r(^fc) ^ 5e. 

n k =l 

The proof for this lemma relies on the upcrossing property established in Lemma 14.131 and 
takes the same form as Lemma 27 in Solan and Vieille [13]. Solan and Veille HI make some 
finiteness assumption (on the set of non-absorbing states on which the target function is not 
bounded away from zero) in order to obtain the existence of X * a subset of and a uniform 
bound Ni > 1 such that 


Vx e V l and ref: P x ^ x)tT {p e > N\) < e 3 . 

Under the assumption that {v n ,n > 1} is totally bounded, we showed in Section 14.2.11 (cf. the 
condition defined in (14.141) ) that we can consider Xl to be the whole set X®. 

The following result is a reformulation of the submartingale property in Lemma 14.121 
Lemma 4.15 For any mo > 1, we have 

IExi,CT,r[f(®m 0 )] > v(xi) - £ 2 - E Xl) a,T[ N ] ~ 2P zi,<t,t (A mo ) - £. 
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Proof. For a proof, we refer to Proposition 28 in Solan and Vieille m, where our lemma is 
stated as Equation (4) in their proof. ■ 


Now we use Lemma 14.131 Lemma 14.141 and Lemma 14.151 to prove the following proposition, 
which concludes the proof of Theorem 13.11 


Proposition 4.16 For any x\ e X° and for any t, 


E. 


X\ ,CT,T 


1 n 

~ E 5(*m) 


, . N\ 

>v(xi)-25e, Vn> —-. 


Proof. Take x\ e X° and fix any r. In this proof h will denote a pure play. We use the fact 
that g(x m ) > v(x m ) - 2e if h £ A m : indeed, either the play has absorbed so g(x m ) = v(x m ), or 
we have v(x m ) < 2e and g(x m ') = 0. Moreover, if h e A m , we use g(x m ) > —1. This gives us: 


E - 


1 


771= 1 


E 9{x m ) 


> —K — 

n 


E 2e) + E 1) 

_m=l m= 1 


> -E, 
n 


Xl,(T,T 


E u (®m) 


_m=l 


+ -E. 

n 


Xl ,<7,T 




_m=l 


Lemma [4.151 (taking average sum on mo = 1, ...,n) implies that 


-2e. 

(4.15) 


-E. 

n 


Xl ,(7,T 


E 

_m=l 


> u(xi) - e 2 ■ E Xl Wr [N] - - E p xi,a,r(74 m ) - £. (4.16) 

^ 771=1 


Moreover, the bound v(x m ) < 1 gives 


I E - 

n 


E l/7eA m (-l-w(x m ) + 2e) 

_m=l 


^ — E ^-heAm (~2 + 2e) 

™ L 777.— 1 

1 71 

= (-2 + 2e)- E 


n 


m=l 


(4.17) 


We substitute (|4.16l) and (14.1711 back into ([4. 151) to obtain 


E. 


X\ ,CT,T 


- E 9(Xm) 


. ^ 777=1 


> v(xi) - £ 2 • E Xl ^ r [lV] - 3e + (-4 + 2e) 


(- E ^*7^(4) J • 

V ^ 771=1 / 


Finally, we use Lemma [4.131 and Lemma 14.141 in the equality to have that: Vn > and Ve < 


E - 


L n 777=1 


E ^(^m) 


> V (xi) 


£ - £*■ 


3e - 20e > u(xi) - 25e. 


note that N\ does not depend on the particular choice of x\ in X°, so the strategy <7 uniformly 
guarantees v - 25e in the infinite game T. ■ 
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4.3 Pure optimal strategy (proof of Corollary 13.41) 

To prove the result, it is sufficient to show that both the strategy s* and the strategy a* defined 
in the proof of Theorem 13.11 can be chosen pure and depending only on the history of states. 

By assumption, the n-stage game T n (x) has a value in pure strategies. It follows that 
Shapley’s equation for any v n is satisfied with pure strategies, and so is Lemma 14.91 We deduce 
that there exists a pure action s* that satisfies the conclusion of Corollary 14.101 

The construction of the strategy a* appeared in the proof of Proposition 14.21 where it was 
defined as the concatenation of a sequence of strategies (<r(x U£ )) at the random stages i- 
As each &(x) is optimal in the n(x)-stage game T n ( x ^(a:), a(x Ue ) can be taken pure. The defi¬ 
nition of the random stages U£ involved a randomized stopping time k e {1, ...,n(x)} satisfying: 


Vr e T, E , XjdjT [g(x~ k j] > minE Xi £ iT /[ 


-n{x) 


n(x) 

E 9 (x t ) 


t =l 


To obtain a pure strategy a*, we show that the random stopping time k can be replaced by a 
stopping time (pure one), which depends only on the history of states and not on the actions. In 
order to build this stopping time, we restrict ourselves to strategies in £ , i.e., strategies which 
depend only on past states. Note that each <j(x U£ ), as an optimal strategy in T n ( x )(aj u ^), can 

be taken in £. 


Lemma 4.17 Fix any a e £ and aq. For any r e 7”, there exists some r e 7 such that 
•P\ci,<T,T (aq = PfCl ,<J,T (aq ,...,x t ) for any (aq, ..., x t ) e X t ,t> 1. 

Proof. For all t > 1, we denote by St := (aq, ■■■jXt) the t first states. For any r e T, define the 
reduced strategy f e T as: 

Tt(s t ) = E ^xu&^ihtls^Ttfht), Vst, Vt>l. 

htcHt(st) 

where Fkt(st) denotes the histories in Ht containing St■ Then we obtain by definition: 

Exi,o-,f (Sf+l) — )P a;i a,r(•St+l) — Pxi ,a,T (®t+l)• 


Lemma 4.18 Fix any aq € X° and o € £. For any n > l, there exists a stopping time 6 : 
Ui<t<n X f -*■ {l,...,n} such that for every strategy r of player 2: 


Exi )fT , T [g( xq)] > minE Xli(T y 

r' 


1 

n 


E^(*t) 


t=i 


Proof. By Lemma 14.171 we can assume that r e T. Let us prove the result by induction. For 
every aq e A 0 , the result is true for n = 1. Suppose that the claim is true for n - 1. Let aq e A 0 . 
By applying the inductive assumption to the different states possible at stage 2, we obtain that 
there is some stopping time 8 + : U” = Y X f {2, ...,n} such that 


r 1 n 

E xl ,a,T[g(x0 + )\x 2 ] > minE^^ --^sr(aq)|x 2 : = w n _i(cr,aq,a:2). (4.18) 

r L n _i i=2 J 
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Denote w n -i(a,x{) = inf ¥< xlA)y \w n -i((j,xi,X 2 )\ We define the stopping time 0 : 
i/eA(J) 

n} by 


V(®i ,..,xt)^X\ 0(xi,...,x t ) = 


ll if 0 > w n _i((7, xi), 

I 0 + (x 2 , Xt) otherwise. 


According to the definition of 6 and the inductive assumption (14.181) for 9 + : 

,ct,t [g(xe)] = g{x l)l 0 >«; n _i(cr ,x\) ®aii,(7,T j^®xi,a,r [ff(®0+)|®2]]l 0<1i) n _l((T,Xl) 
> max {0,7C n _i(er, xi)} 


77- — 1 

>- w n -i(cr,xi). 

n 


Finally g(xi) = 0, therefore 
n — 1 


n 


-w n ~ i 


(o-,xi) = infE. 


'X\ ,CT,T' 


n—1 


n-1 


E 5(*t) 


t =2 


= minE T 


1 n 


This concludes the inductive proof. 


Remark 4.19 Let r k a stochastic game where the payoff function depends only on the state 
but not the actions, the proof for the above result follows the same way. 


5 Application to recursive games with signals 

In this last section, we apply our result to the model of finite recursive games with signals where 
one player is more informed than the other player. Introducing an auxiliary stochastic game 
similar to the one defined in Gensbittel et al. [3], we show that the study of such a recursive 
game can be reduced to the study of recursive game with a countable state space satisfying the 
assumption of Corollary 13.41 

5.1 Model 

The following model of general repeated game is introduced in Mertens et al. [8]. A repeated 
game T = (K, I, J , C , D, g , q) is given by 

• a finite state space: K. 

• two finite action spaces I and J. 

• two finite signal spaces C and D. 

• a payoff function: g : K x I x J ->• [-1, +1], 

• a transition probability function (on states and signals): q from K x I x J to A (K x C x D). 
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Denote by r(vr) the game with an initial probability distribution it e A (K x C x D ), which is 
played as follows. Initially, the triple {k\,c\,d\) is drawn according to it. At stage 1: player 
1 learns ci and player 2 learns d\ . Then simultaneously player 1 chooses an action i\ e I and 
player 2 chooses an action j\ e J. The stage payoff is and the new triple (/c2,02,^2) 

is drawn according to q{ki,i\,j\). The game then proceeds to stage 2: player 1 observes C2, and 
player 2 observes efe etc... 

We assume that each player’s signal contains his own action. Formally, there exists i:C -*■ I 
and j : D -> J such that 


Vfc e K- Yj <l(k,i(c),j(d)){k',c,d) = 1. 

k r ,c,d 

We will focus on repeated games with the following two features: recursive and one player is 
more informed than the other. 

Definition 5.1 The repeated game T is recursive if there exist K° and K*, a partition of K 
such that: 

• the stage payoff is 0 on active states: V(k,i,j) e K° x I x J, g(k,i,j ) = 0. 

• states in K* are absorbing: Vfc e K*, d^DQ(k,i,j)(k,c,d) = 1 for all ( i,j ) fix J and 
g(k,i,j) depends only on k. 

In the rest of the paper, a recursive repeated game will be called a recursive games with signals. 

Definition 5.2 Player 1 is more informed than player 2 in the recursive game T if there exists 
a mapping d: C D such that, if E denotes {( k,c,d ) e K x C x D, d(c ) = d}, then 

q(k,i,j)(E) = 1, y(k,i,j) e K x I x J. 


Notation 5.3 We denote by: A 1 (/t xCx D) = = 1}. 

We define similarly that player 2 is more informed than player 1. Whenever player 1 is more 
informed than player 2 and player 2 is more informed than player 1, T is a repeated game with 
symmetric signals. We denote by A *{I\ x C xD) the set of symmetric initial distributions. 


Remark 5.4 By assumption, if player 1 is more informed than player 2, he learns especially 
the action played by player 2 since it is included in the signal of player 2. Player 2 is in general 
not inform.ed of the action played by player 1. 

In Gensbittel et al. m, the authors considered a weaker notion of "a more informed player" 
but they made a different assumption on the transition function, especially that the less informed 
player has no influence on the evolution of beliefs of both players. It is not clear if our result still 
holds under this weaker assumption. 
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5.2 Evaluation 


At stage t, the space of past histories of player 1 is Hf = (C x /) f_1 x C and the space of past 
histories of player 2 is Hf = (D x J) t_1 x D. Let = (K x C x D x I x J)°° be the space of 
infinite plays. For any play h = (k s ,c s ,d s ,i s ,j s ) s > i, we denote by hf its projection on Ht , by h\ 
its projection on Hj, and by hf its projection on Hf. 

A (behavior) strategy for player 1 is a sequence (< 7t)t>i of functions at '■ Hf A(L). A 
(behavior) strategy for player 2 is a sequence r = ( Tt)t>i of functions r* : Hf -*■ A( J). We denote 
by S and T players’ respective sets of strategies. An initial distribution n e A(/L x C x D) and 
a couple of strategies (<r, r) define a probability distribution over the set of infinite plays, which 
we denote by PJ T . Let E^ T be the expectation w.r.t. to PJj r . 

For any given tt e A(/\ xCx D), let 7 n (7r,cr, r) (resp. j\(TT,a, r)) be the expected n-stage 
payoff (resp. A-discounted payoff) associated with (<r, t) e S x L. We denoted by v n (ir) the 
n-stage value and by v\(ir) the A-discounted value. 

Definition 5.5 Given an initial distributions e A(KxCxD), the game T(tt) has an asymptotic 
value v(tt) if: 

v(r r) = lim v n (ir) = limuv(vr). 
n->oo A->0 

Definition 5.6 Given an initial distribution tt e A(iv x C x D), the game r(7r) Las a uniform 
maxmin v joo {tt') if: 

• Player 1 can guarantee i.e., for all e > 0 there exists a strategy a* e X of player 1 

and no > 1 such that 


Vn > n 0 , VreT, 7 n ( 7 r,a*,r) > ^(71-) - e. 

• Player 2 can defend i.e., /or all e > 0 and for every strategy a e £ of player 1, t/iere 

exists no > 1 and r* e T suc/i that 

Vn > n 0 , ^ n (TT,a,T*) < v^s) +e. 

The game r(7r) has a uniform . minmax Voo( tt) is defined similarly if player 2 can guarantee 
Toc,(vr) and player 1 can defend ^(vr). 

Definition 5.7 Given an initial distribution n e A(/i x C x D), ree say that r(7r) has a uniform, 
value if both hoo(vr) andr 00 (7r) exist and are equal. Whenever the uniform value exists, we denote 
it by Voo( tt)- 

5.3 Results 

Theorem 5.8 Let T 6e a recursive game such that player 1 is more informed than player 2. TTien 
/or every distribution tt € A 1 (K x C x D), both the asymptotic value and the uniform m.axmin 
exist and are equal: 

LooM = limv n (7r) = limv A (7r) 

By symmetry, we deduce a similar result by exchanging the roles of player 1 and player 2. When 
the information is symmetric, both results are true and we obtain the existence of the uniform 
value. 
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Corollary 5.9 Let T be a recursive game with symmetric signals. Then for every n e A *(K x 
C x D), the game r(7r) has a uniform value. 

It is known from Ziliotto |19| that stochastic games with symmetric signals may have no uniform 
value. Therefore recursive games have very particular properties. It is a challenging task to 
identify the subclass of repeated games with ^(vr) = limv n (7r) = limnA(vr). 

Remark 5.10 Note that we have assum.ed that the stage payoff on absorbing states does not 
depend on the actions played. Under this assumption, players’ strategies have only an influence 
on non-absorbing plays. Therefore, without loss of generality, we assume in the following that 
players observe whenever an absorption occurs and in which state it is. 

If we consider that the payoff in absorbing states still depends on the actions played, then our 
proof does not work. Indeed the auxiliary game introduced in Proposition \5.16\ is not recursive 
anymore. The result n 00 (vr) = limr; n (7r) = limnA(vr) is unknown for this general case. 

Remark 5.11 It is not known whether recursive games with any structure of signals have a 
uniform value. As highlighted in Rosenberg and Vieille m, the equicont.inuity of the X-discounted 
value functions is sufficient in order to deduce the existence of the uniform value for recursive 
games (with perfect observations). For a recursive game with any structure of signals, one can 
introduce the game associated with a universal belief space but we do not know a metric on this 
space such that the X-discounted values or the n-stage values are equicontinuous/totally bounded. 

5.4 Proof of Theorem 15.81 

We introduce some notations concerning different belief hierarchies. Denote by B\ = A (K) the set 
of beliefs of player 1 on the state variable. Denote by B 2 = Af(Bi) = Aj(A(A)) the set of beliefs 
of player 2 on the (first-order) beliefs of player 1. Finally, we denote by Af{Bf) = Aj(Aj(A(/\ ))) 
the set of probability distibutions over the second-order beliefs of player 2. 

Overview of the proof 

We fix r a recursive game with signals such that player 1 is more informed than player 2. The 
first subsection presents general properties for repeated games with one player more informed 
than the other. Given any it € A l (K x C x D), we can define the distribution of the beliefs of 
player 2 on the beliefs of player 1 about the state. This defines a function from A 1 (A'xCxD)to 
A/(-E> 2 )- Applying results in Gensbittel et al. [3], we know that v n (ir) depends on it only through 
<I>(vr). This enables us to show that the value function v n , defined on A 1 (K x C x D), induces 
a canonical function v n defined on B 2 such that v n (n) = f) n (<I>(7r)) and the family {v n ,n > 1} is 
totally bounded. 

In the second subsection, we introduce an auxiliary recursive game Q which is defined on B 2 
and is played with pure actions. We prove in Proposition 15.171 that the n-stage value of Q n is 
equal to v n . Therefore, Q satisfies the conditions of Gorollarv 13.41 and it has a uniform value Woo- 
It follows that T(7 t) has an asymptotic value equal to tc 0 o( < l > ( 7r ))- 

The third subsection proves that (cf. Proposition 15.221) player 1 can uniformly guarantee 
'Woo( < h(7r)) in r(vr) by mimicking uniform £-optimal strategies in ^(4>(7 r)). 

The last subsection proves that (cf. Proposition I5.25[) that player 2 can uniformly defend 
^oo( < l ) (7r)) by introducing a second auxiliary recursive game IZ. 
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5.4.1 Canonical value function v n 

We follow in this subsection Gensbittel et al. |3J to introduce the canonical function v n . Note 
that to obtain results in this subsection, the additional assumption that player 1 controls the 
transition (made later in their paper) is not used in Gensbittel et al. |3f. 

For convenience, we extend the definition of r(7r) to a larger family of initial probability 
distributions. Given any two finite sets C' and D' and 7r e A 1 (K x C' x D'), r(7r) is the game 
where ( k,c',d') is drawn at stage 1 according to 7r, player 1 observes c!, player 2 observes d! 
(which is contained in d it - a.s.) and then from stage 2 on, the game is played as previously 
described with signals in C and D. 

For any random variable £ defined on a probability space P) and T a sub cr-algebra of 

A, let £p(£ | J-) denote the conditional distribution of £ given T. which is seen as a ^"-measurable 
random variable]! and let £p(£) denote the distribution of £. 

Notation 5.12 For every strategy profile (a, r) e £ x T, we denote the first-order belief of player 
1 on K at stage n given h\ by p n 6 B\, the second-order belief of player 2, i.e., his belief about the 
belief of player 1 on K at stage n given h‘ 2 n by x n € B 2 , and the distribution of x n by r] n € Ay(f? 2 ), 
i.e., 

Pn = ^ T {kn | x n = (Pn\hn ), and T] n = Apj T (x r) .). 

Notation 5.13 For any 1 r e A l * (K x C' x D') where C' and D' are two finite sets, the image of 
n is given by the following function in /S.f(F> 2 ): 

<F(7t) = C n (£ n (Ar(A:i|ci)|di)), 

7I '( d ) <5 (E«0' 7r ( c M) 5 Tr(.| c,d))' 

The interpretation of <F(7r) is as follows: with probability 7r(d), player 2 observes the signal d 
and believes that: player 1 received the signal c with probability 7r(c|d) and therefore player l’s 
belief over K is 7r(.|c, d). 

The assumptions imply that if 7r e A 1 (K x C' x D 1 ), then 7r satisfies the following two prop¬ 
erties: 

PI) Tr(c)n(k,c,d) = TT(k,c)n(c,d), V(k,c,d) e K x C' x D'. 

P2) There exists a map f\ = ff : C' -> B 2 such that x\ = /i(ci), 7r-almost surely. 

Under PI) and P2), Proposition 1 of Gensbittel et al. [3] applies and we obtain the following 
result, which states that the value of any n-stage game depends on any initial distribution 7r only 
through its image d^Tr). 

Proposition 5.14 [Gensbittel et al. 2014] Let C' and D' be two finite sets. Let e 
A 1 {K x C' x D') and let n > 1. If <h(7r) = $( 77 ), then v n (n) = v n fiifi\ 

3 All random variables appearing here take only finitely many values so that the definition of conditional laws 

does not require any additional care about measurability. 
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Reciprocally, given p e Af(B 2 ), let us construct a canonical distribution n satisfying <h(7r) = p. 


The canonical game f (p). Given p e Af(B- 2 ). Define two finite sets D' := supp{p ) c B 2 and 
C' := D' x (Uxesitpp(r)) supp(x )), and a probability distribution ir(p) e A(/\ x C' x D') by 


V(k,p) e A' x R 1; x, x' e R 2 , vr(&, (p,x),x r ) '■= 


( p(x)x(p)p(k) if x = x' 
0 if x * x'. 


By construction, 7 r(ry) can be seen as an element of A 1 (A xC'x D r ), and satisfies &(ir(p)) = 7 ?. 
The canonical game of r(7r) is denoted as T(p). Its value, denoted by v n (p), is equal to v n (ir(r /)) 
the value of r n (7r(?7)). If p = 5 X for some x e B 2 , we denote v n (x ) for v n (S x ). 

Informally, the game T(p) proceeds as follows: p is common knowledge, player 2 is informed 
about the realization sofa random variable with law p (player 2 learns his beliefs). Then player 
1 is informed about x (his opponent’s beliefs) and about the realization p of a random variable 
with law x (his own beliefs). The state variable is finally chosen according to p, but no player 
observes it. 

By the above construction, one obtains that: v n (ir) = ^^(vr)) for any 7r e A 1 (K xC'x D'). 


The result below follows from Proposition 2 of Gensbittel et al. [3]. The Wasserstein metric 
d on B2 = A(Ri) is defined by: 

Vx,yeB 2 , d(x,y) = sup / f(p)x(dp)- / f{p)y{dp) , 
f e X> \JB-y J B\ 

where V is the set of 1-Lipschitz function from (B\, ||.||i) to [-1,1]. 

Proposition 5.15 [Gensbittel et al. 2014] Let p e Af(B 2 ), n > 1 and let x e _E? 2 - Then, v n (p ) is 
linear on Aj(B 2 ) and, as a mapping on B 2 , v n (x ) is 1-Lipschitz for the Wasserstein metric d. 

Since the state space B 2 is totally bounded for the Wasserstein metric, we deduce by Arzela- 
Ascoli theorem that the set of functions {v n ,n > 1} is totally bounded. 


5.4.2 Auxiliary recursive game and asymptotic value 

Let Q = ( X,A,B,G ,£) be the stochastic game played in pure strategies, defined by: 

• the state space X = A^(A(A')) (endowed with the Wasserstein metric d). 

• the action space A = {/ : A(A') -*■ A (/)} and for all x e X, A(x) = {supp(x) -*■ A (/)} for 
player 1. 

• the action space B = A (J) for player 2. 

• the payoff function G ■ X -> [-1,1], defined for any x e X by G(x ) := EpeA(X) d(p) x (p)■ 

• the transition function i : X x A x B ->■ A f(X) dehned as l(x,a,b ) := <h(Q(x, a, b)). Here, 
Q(x,a,b ) € Aj(K x (A(AT) x C) x D) is the joint distribution of (^ 2 , (p, C 2 ), ^ 2 ) in the 
canonical game T(h x ) when the players play ( 01 ,ti) = (a, b) at stage 1. The sets K,C,D 
and supp(x) being finite, Q can be seen as an element in A 1 (A' xC'x D') with C' a finite 
subset of A{K) x C and D' = D. 
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For any x € X, we denote by G(x) the game starting at x. We extend the definition to G(z) 
for any z e Af(X) such that the initial state is chosen randomly along 2 . 


Since players observe when and where absorption occurs, their beliefs (first and second-order) 
are either supported on K° (therefore respectively in A (K°) and in A(A(AT 0 ))) or supported 
on each single point k € K* (to be 5k and to be 6$.). 

Proposition 5.16 Let X r = Af(A(K°))\J{8 Sk : k e K*}. The game G r = (X r ,A,B,G,£) with 
the state space X r is well defined and is recursive with the absorbing states {S$ k : k € K*}. 

In the following, we identify each 5s k with k itself for any k € K *, and write X r = Af(A(K°))u 
K*. By abuse of notations, we write again X for X r and G for G r ■ 


Proposition 5.17 For every n > 1, the n-stage game G n has a value w n in pure strategies. 
Moreover, for every x € X, w n (x) = v n (x). 


Proof. We prove the result by induction on n > 1. Let n = 1. Given x e X, the game Gi(x ) has a 
value w\(x) and it is equal to u’i(x) = G(x) = Y, P g(p) x {p)- It is equal to v\(x) by construction. 
This initializes our induction. Let n > 1 such that w n , the value of Gn, exists in pure strategies, 
and for every x e X, w n (x) = v n (x). Gensbittel et al. |3] showed in the proof of their Proposition 
5 that the family {v n ,n > 1} satisfies the Shapley equation: for every x e X and for every to > 1, 


v n +i(x) = sup infE* (xa6) 

aeA(x) b ^ B 

= inf sup 

beB asA(x) 


—}—g(x,a,b) + -^—Vn(x') 
n + 1 n + 1 

—^—g(x,a,b) + -^—v n (x') 
n +1 n +1 


where the random variable x' € X is chosen along the law f(x,a,&)(•). By the inductive assump¬ 
tion, we can replace v n by w n on the right hand side of above equation, to obtain that: 


VxeA, v n +i(x) = sup infE^( xa6 ) 

aeA(x) b ^ B 

= jnf sup E£ (x a 6) 
b£B atA(x) 


—^—r 9(x) + -^—Wn(x') 
n+ 1 n +1 

g(x) + —^—w n (x') 


n + 1 


n + 1 


We now use the above equation to show that both players can guarantee v n+ i(x) in G n +i(x) in 
pure strategies. Let x € X be fixed and a* be an action of player 1 such that 


inf E 

beB 


’l(x,a* ,b) 


1 / \ ^ / /\ 

g{x) +- -w n (x ) 


TO + 1 


TO + 1 


> v n+1 (x). 


(5.1) 


Again by inductive assumption, let a*(x') be an optimal pure strategy in Gn(x’), Vx' e X. We 
define the strategy cr* +1 (x) to play a* at the first stage and then a*(x') where x' is the current 
state at stage 2. <r* +1 (x) is pure and guarantees player 1 the payoff in G n +i(x) no smaller than 
the left hand side of Equation (15.11) . hence v n+ \(x). A similar construction for player 2 finishes 
the inductive proof. ■ 


Therefore, by Proposition 15.151 the family of ?r-stage values {w n } is totally bounded for the 
uniform norm, and we can apply Corollary 13.41 (with infinite sets of actions) for the game G- 
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Proposition 5.18 For every z e A f(X), the game G(z) has a uniform value denoted by w^ a (z). 
Moreover both players can uniformly guarantee the value with pure strategies that depend on the 
history of states but not on the past actions. 

Since v n (ir) = t) n ($(7r)) = and the same construction of the canonical value 

function v\ implies v\(ir) = = ic^(<l>(7r)), we deduce the existence of the asymptotic 

value in r(7r). in the game r(7r) for every 7r e A 1 (K x C x D ). 

Proposition 5.19 For every it e A 1 (K x C x D), we have 

lim v n (ir) = lim v\(tt) = uC(4>( n)). 

n-*oo A^O 

5.4.3 Player 1 uniformly guarantees w ^ 

We first show that player 1 is able to compute in the original game (pt)t> l his first-order beliefs 
and (xt)t >l the second-order beliefs of player 2 without knowing the strategy of player 2. 

Lemma 5.20 Let (<t,t) be a pair of strategies in r(7r). For every t > 1, pt = Tpj ( kg\h \) and 
Xt = T (p t \ht) are independent of r for all h\,h 

Proof. Let (cr,r) be a pair of strategies and it e A (ft x C x D), we write P := PJ r for short. 
Let h = (k s ,c s ,d s ,i s ,j s ) s > i € IT*,. For any t > 1, we define 

t-i 

P(ht) = n(ki,ci,di)Y[q(ke,ie,je)(ke+i,ce+i,de+i) 
i=i 

with the convention @{k\,c\,d\) = 7r(&i,ci,di). These notations help to write 


P(h t ) = P(h t )Y[vt(h\)[it\n(h})[ji]. 

l=\ 

The key point is that under P(-), it- 1 , jt-i and dt are cj-rneasurable whereas jt-i is (^-measurable. 
It follows that after observing (ci,...,q), player l’s belief is: 


Pt(kt) = P(fet|ci,...,ct) =¥(k t \ci,di,ii,ji,....,ct,d t ) 


Y,k[,...,k' t _ 1 Hk[,ci,d 1 ,ii,ji...,k t ,c t ,d t ) 
Y.k' 1 ,...,k' t _ 1 ,k' t H k v c i,di,ii, ...,k' t ,ct,d t ) 


7 1 hi; ^1 5 Jl 5 ...J kt 5 Q ? (^ ) 

^/k r ^,... } k't-i^t 5 5 ^1 7 ^17 Jl7 ***7 kti Ct) dt) 


which depends on neither <r nor r. We now consider xt = Cp{pt\d\, ...dt) for a given observed 
history /ij = (d\,...,dt) of player 2, which is decomposed as: 


,x f = £p(£p(fc t+ i|ci,...,c f )|di,..,d f ) = Y, p ( c i)-»^)^£ P (fe t+ 1 |c' 1 ,...,cJ)- 


By the previous result that £p(fci + i|c^, ...,cj) does not depend on r, it is sufficient to prove that 
P(c' 1 , ...,c' t \di,...,dt ) is independent of r. Let us consider a sequence of signals (c^, ...,c£) inducing 
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(d\,...,dt) that we complete with (i[, the sequence of actions it contains. This gives us 


P( c i> c[\di, ..., d t ) = F(c' 1 ,d 1 ,i' 1 ,j 1 ,...,c t ,dt\d 1 ,j u ...,d t ) = 

T.k'k'P(K) 


F(c' 1 ,d 1 ,i[,ji,...,c , t ,d t ) 
P(^i, ji,-,dt) 


Ek[,...,k’ t ntlaeih'DW 


where hg = (k[, c[, di,i[,ji ,..., kg, c'g, d() is the history of stage i and hl = is the 

private history of player 1 of stage i. The right hand side of the above equation does not depend 
on the strategy of player 2 and the result is obtained. ■ 


Before building the strategy of player 1, we prove that the transition rule f(-):JxdxB-> 
A f(X) of the auxiliary game is linear with respect to b € B (the action of player 2). 

Lemma 5.21 For any ( x,a ) € X x A and b = EseS X s b s a convex combination in B, we have 

£{x,a,b) = £ A s l(x,a,b s ) . 

stS 


Proof. Let ( x,a ) e X x A and b e B. Recall that Q := Q(x,a,b) denotes a distribution 
in A f(K x (A (K) x C) x D), which can be seen as as an element in A 1 (K x C' x D') with 
C' = supp(x) x C a hnite subset of A (A") and D' - D We have by definition of the image 
mapping $(•): £(x,a,b) = d*(Q) = Zd'eD' Q( d ') S c Q (C Q (k\c’)\d')- 

Similarly to the previous lemma, for every ( c',d ') = (( p,c),d') e C' x D' , £q^£q(A;|(p, c)) \d'^ 

does not depend on b. Indeed, the signal ( c',d') contains the action = (i(c'),j(d')) and 

F contains d! a.s. It follows that 


P Q (k\p,c) 


ajp) gOc) ] b [X c) ] g K xC (p,i(c),j(c))(k,c’) 
a(p)[Kc)]b[j(c)]q c (p,i(c),j(c))(c') 


g KxC (p,i(c),j(c))(k,c') 

q c (p,i(c),j(c))(c') 


and 


P Q(P,c\d') 


x(p)q c (p,a(p),j(d'))(c) 
Zpzsupptx) x(p)q D (p , a{p),]{d'))(d’) 


Since these quantities do not depend on 6, we will not precise b in the following. The application 
Q(x,a,b ) being linear in b, we can easily deduce the announced result: 


$(Q(x,a,b))= £ (iC a (fci^)^) 

d'zD' 

= E (E X sQ(x,a,b s )(d')5 CQ ^ a a )(fe | c q|^)) 

d'eD' VseS / 

= £ A s $(Q(x,a,b s ). 


We now use the two previous lemmas to prove that player 1 uniformly guarantees w (<b(7r)) 
in the game T(7r). 
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Proposition 5.22 Player 1 uniformly guarantees in r(7r). 

Proof. Fix any e > 0. We divide the proof into three steps. First, we define the optimal 
strategy a in r(7r). Then we show how to link the distribution over the states in Q to the dis¬ 
tribution of second-order beliefs in T. Finally, we deduce that the strategy a is uniform e-optimal. 

Step I: Defining the strategy. 


Consider the auxiliary game G(z) with z = <h(7r) e A f(X). According to Proposition 15.181 
player 1 has pure uniform £-optimal strategies which depend on histories only through the states 
but not the actions. With a slight abuse of notations, there exists a* : U~i X t -*■ A = {o : 
A(ft) -> A(/)} and Nq > 1 such that 

oo 

7 n (z, &*, t ) > w* OQ {z) - £ for all n> Nq and for all f : [J X t -»■ B = A( J) 

t=\ 

where 7 n (z,<r*,f) is the expected n-stage average payoff in the auxiliary game G(z) induced by 
(z,a*, t). 

We define the strategy a* € £ in the game r(7r) such that for any h \, 

= a*(xi, ...,x t )[p t ] with p t = (k t \h]) and x t = Lp* (p t \h 2 ). 

cr* er* 

By Lemma 15.201 this is a well defined strategy of player 1 since he can compute pt and xt at 
every stage t > 1. We now check that the strategy a* uniformly guarantees w^ a (z) - £ in r(7r). 

Step II: Linking the probability law of beliefs 


Let r € T be a strategy in r(7r). We define a strategy f in Q (<f>(7r)) such that (n,a* ,t) and 
(<h(7r), <7, f) generate the same probability law for (xi,..., xt, •••)• With a slight abuse in notation, 
we denote by t the strategy in Q such that for all (xi, ...,xt) e X t , 

r(xi ,...,x t )= Y. K*,A h t\ x i’-’ x t) T ( h t), 

where H}(x i , ...,xD = {h? e H?\Cpn (kAhj) = X£, 1 < l < t} denotes the set of player 2’s t-stage 
histories in T that induce the beliefs (xi,...,xt). 

Lemma 5.23 Let a* and t be constructed as above given a* and t, we have: 

Vt > 1, Ap- (xi, ...,x t ) = (xi,...,x t ). 

a* ,t ,t 

Proof of Lemma 15.231 We prove the lemma by induction on t > 1. For t = 1, the law of xi is 
independent of the strategy profile. By definition of the image mapping <!?(•), 

£p^ r (xi) = L-k (^(/^(fcilci^di)) = $(7r). 

As <h(7r) = z, the probability law to choose the initial state xi e X in G(z), . (xi) = <F(7r). 
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Suppose now that we have proved that £jwr (xi,... ,xt ) = £pz (xi, ...,Xt) for some t > 1. It 

£7*,T <T*,T 

is then sufficient to prove that conditional on any realizatior@ §t '■= (xi, ...,Xt) e (BiY , 


£ P vr (x t+ i|xi = xi,...,x t = x t ) = £p* _(x t+ i\xi = xi,...,x t = x t ). 

ct ,t cr ,r 

Fix some s* = (xi,...,Xi) e X*. By definition of f and the linearity of £ showed in Lemma 15.211 
we know that 

£p* „(xt+i|xi = xi, ...,x t = x t ) = £(x t ,a*(s t ),f(st)) 

<J* ,T 

= E K*A h t\ § t)e{xt,v*(st), T (h 2 t )). ( 4 5 ' 2 ) 

By definition of the conditional expectation, we have in T, 

(x t+ i|xi = xi,...,x t = x t ) = E r (^|st)£p% (x t+ i\ht). 

Thus, it is sufficient to prove that for every hj e H?Cs t ), £(xt,a*(st),r(ht)) = Cp-* (xt+ilh?). 
Let ht e iL f 2 (x*) and Q[/i 2 ] := Q (x t , a*(st), r(/i|)) e Aj (/i x (A(A") x C) x D) the joint dis¬ 
tribution of (fci+i, (pt, Ct+i), df+i) in the canonical game r(<5^ t ) when (d*(st), r(/i 2 )) e A x 5 is 
played. By definition of the image mapping <!>(•) and cr*, we obtain 

£p^^(x t+ i|/i?) = £^2] (-^Q[h 2 ](^i+il c £+i)Mt+i)) = $(<9l>t]) = ^ x t ,a*(s t ),T(h 2 t )). 

□ 

Step III: Conclusion of the proof 


Finally, let us compare the payoffs in both games. If k* e K*, we have G(k*) = g(k*) = 
EJj* r [. 9 (/ci)|xj = fc*]. If xt € (Aj (A(/\ 0 )) ), we have G{xt) = 0 = E£* T [<?(fct)|xt]. It follows that 
for every xt e X, we have G(xt) = E^j* [g(fcf)|xt]. By taking conditional expectation, Lemma 
15.231 implies that E|* -[G(xt)] = E^* T [<7(fct)]. Since a* is uniform e-optimal in the auxiliary 
game Q (<F(7r)), we obtain 

7„(7r,cr*,r) = 7n($(7r),(j*,f) > w4(<F(7t)) -e for all n > iV 0 . 

Therefore, the strategy cr* uniformly guarantees w^^^ir)) -e in r(7r). ■ 


5.4.4 Player 2 uniformly defends w*^ 

We now prove that player 2 can defend w^( < K 7r )) = liui v n (tt ) = limu^(7r). The situation of 
player 2 is different since he is allowed to know the strategy of player 1. In order to prove this 
result, we introduce another auxiliary recursive game 1Z. 

For any n > 1, let H' n <= H n be the set of n-stage histories such that player 1 can deduce player 
2’s private signals, and H { r \ c H n be the set of n-stage histories containing only non-absorbing 
states. We consider the following game 1Z where the set of states is almost the set of distribution 
over all finite histories. It is defined as follows: 

4 For this part of the proof it is convenient to differentiate the random variable describing the second order 

belief (or the state in Q) that will be denoted by Xt from its realization denoted by Xt- 
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• the state space is Z = Zq IJ K* where Zq = Un>i A(#0 n ^n)> 

• the action space of player 1 is A = Un>i {/ : Hn A(/)} and for any ir n € A (H n ), 

A(TT n ) = {/ : -*■ A(I)}, 

• the action space of player 2 is B = Un>i {/ : H%, A(J)} an d f° r an y n n € A( H n ). 

• the transition Q '■ Z x Ax B ^ A f{Z) is given by: 

y(k* ,a,b) e /i* x Ax B, Q(k* ,a,b) = 8^*, 


and 

V(z,a,b) € Zq x Ax B, Q(z,a,b) = Q°(z,a,b)5 n o + Y Q{z,a,b){k*)8k *, 

keK* 

where Q(z,a,b)(k*) is the probability of absorption in state k* at the next stage given by 

Q(z,a,b)(k*) = Y z ( h n)a(hl l )[i]b(hl)[j]q(k n ,i,j)(k* ,c,d)- 
h n ,i,j,c,d 

Q°(z,a,b ) is the probability of no absorption given by 

Q°(z, a, b) = Y E z (K)a(hl l )[i]b(hl)[j]q(k n ,i,j)(k,c,d), 

h n ,i,j,c,dkeK° 


and 7T° € Zq is the conditional probability on not having absorbed, i.e ., 


V (h n ,k,i,j, c, d) e H n x K x I x J x C x D, ir°(h n ,i,j,c,d ) = 


z(h n )a(hl)[i]b(hl)[j]q(k n ,i,j)(k,c,d) 
Q°(z,a,b ) 


• the stage payoff function R.ZxAxB -> [-1, +1] is given by 

V(TT n ,ci,b) <e Zq x Ax B, R(Tr n , a, b) = 0, 

and 

V(fc*,a,6) tK* xAxB, R(k*,a,b) = g(k*). 

By construction, the game 1Z is recursive. We denote by S (resp. T) the set of behavior 
strategy for player 1 (resp. for player 2) in the game 1Z. 

Proposition 5.24 For every it € Zq and every n > 1, the n-stage game lZ n (ir) has a value in 
history independent pure strategies, which is denoted by v n (Tr) and 

v n (ir) = v n (ir). 

Moreover, if player 2 can uniformly defend some payoff level v in the game TZ(tt) with pure 
strategies then he can also uniformly defend v in the game r(7r). 
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Proof. First, a strategy a of player 1 in T is a sequence of applications (cr n )n>l such that a n 
is a mapping from Hf to A(7). By definition, this is a sequence of actions in the game 7Z, i.e., 
a history independent pure strategy in 1Z. Similarly, a strategy r of player 2 in T induces a 
sequence of actions in 1Z. By definition of Q and R, it follows that for every it e Zq, a e £ and 

r € r, 


7 n (7r,cr,r) = 7 n (vr,cr,r). (5.3) 

Let a be an optimal strategy of player 1 in the game r n (7r). Consider now a pure strategy 
r € T. The triple (7r, cr, r) generates a probability distribution P on (Z x A x B ) N such that 
there exists at most one play (^t, a t,bt)t>l that is non absorbing P - a.s., i.e., ( 717 , a*, &t)t>i € 
( Zq x Ax B ) N . Define the strategy r e T of player 2 in r(7r) by setting 77 = bt for all t > 1. We 
obtain 

7 n (7r,cr,r) = 7 n (7r,a,r) = 7„(7 T,ct,t) > u n (7r). 

Therefore, player 1 guarantees the payoff u n ( 7r ) hi TZ(tt) with the history independent pure strat¬ 
egy a. Similarly, player 2 can guarantee v n (ir) with a history independent pure strategy and 
v n ( vr) = n n (7r). 

Finally, let us assume that player 2 can uniformly defend the payoff level v with pure strategies 
in the game TZ(tt). Let e > 0 and a e £. Interpreting a as an history-independent strategy in 1Z, 
there exist No > 1 and a pure strategy ref such that 

Vn > N 0 , < v + e. (5.4) 

As in the previous paragraph, we can associate to the triple ( 7 r, a, r) a unique play ( 77 , at, bt)t> 1 
in (Zo x A x i7) N and define the strategy r e T of player 2 in r(7r) by setting Tt = bt for all t> 1. 
We obtain 

Vn> IVo, 7„(7r,(T,r) =%(tt,ct,t) =%(n,a,T) <v + e. 

This proves that player 2 can uniformly defend v in T(7 t). ■ 

We conclude by showing that the game 7Z fulfills the conditions of Corollary 13.41 

Proposition 5.25 Player 2 uniformly defends (<F(7t)) in r(7r). 

Proof. We already noticed that the game 7Z is recursive. Let 7r e A (H® n H' n ) c Zq for some 
n > 1. Since player 1 is more informed than player 2 ( 7 r supported on H' n ), 7 r can be identified as 
an element in A 1 (K x C' x D') for some finite C' and D'. By Proposition 15.241 we obtain that 
for any 7r € Zq , 

v n (n) = v n (Tr) = v n (4>(7r)). 

According to Corollary 15.151 the family {v n ,n > 1} considered as functions on B -2 is totally 
bounded, and so is the family of their linear extensions to Aj(L> 2 )- 

By Corollary 13.41 TZ(tt) has a uniform value (<h(7r)) in pure strategies for every 7r e 
A 1 (/\ x C x D ). It follows from Proposition 15.241 that player 2 can uniformly defend w ^ (<h(7r)) 
in r(7r). ■ 
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